Chapter 3

Title: Integrative multi-omics reveal glial signatures associated with accelerated cognitive decline in Alzheimer’s disease

Overview:

  • Data Integration: Using Omix, transcriptomics and proteomics data are integrated to identify modules representing distinct biological mechanisms in a discovery cohort.

  • Projection on ROSMAP Data: These modules are projected onto publicly available early AD (Braak III-IV) data from the ROSMAP cohort.

  • Molecular Subgroup Identification: Multi-omics clustering is employed to identify molecular subgroups in early AD.

  • Progression Analysis: The subgroups are analysed for differences in disease progression using Kaplan-Meier survival analysis.

  • Disease mechanisms activity assessment: The activity of biological mechanisms (module eigenvalues) identified in the discovery cohort is compared between the molecular subgroups in the ROSMAP cohort.

Summary of methods:

  • Data Pre-processing: Quality control, normalisation, and denoising of RNA and proteomics data.

  • Integration Models: MOFA for bulk transcriptomics-proteomics integration and iCluster for molecular subtyping.

  • Downstream analyses:

    • Community detection within multi-omics transcriptomics-proteomics co-expression networks.

    • Transcriptomics-proteomics module eigenvalue computation and correlation with neuropathological features.

    • Cell type and pathway enrichment analyses.

Impact:

This analysis offers a deepened understanding of the biological mechanisms underlying glial activation and cognitive decline in AD, facilitating the identification of novel biomarkers and potential therapeutic targets. The pipeline also establishes a scalable framework for integrating multi-omics data in other neurodegenerative conditions.

Some of these analyses are available in my pre-print: medRxiv 2024.08.27.24312641; doi: https://doi.org/10.1101/2024.08.27.24312641

Omix package:

Part I - Transcriptomics-proteomics integration in the discovery cohort using Omix

The goal of Omix is to provide tools in R to build a complete analysis workflow for integrative analysis of data generated from multi-omics platform.

  • Generate a multi-omics object using MultiAssayExperiment.

  • Quality control of single-omics data.

  • Formatting, normalisation, denoising of single-omics data.

  • Separate single-omics analyses.

  • Integration of multi-omics data for combined analysis.

  • Publication quality plots and interactive analysis reports based of shinyApp.

Currently, Omix supports the integration of bulk transcriptomics and bulk proteomics.

Running Omix

The Omix pipeline requires the following input:

The following datasets are required to run the Omix pipeline:

Dataset Description Format
rawdata_rna RNA raw counts, genes as rows, samples as columns CSV
rawdata_protein Protein abundances, proteins as rows, samples as columns CSV
map_rna Mapping of RNA sample IDs to metadata CSV
map_protein Mapping of protein sample IDs to metadata CSV
metadata_rna RNA-specific metadata CSV
metadata_protein Protein-specific metadata CSV
individual_metadata Individual-level metadata for both assays CSV

Call required packages for workflow:

library(Omix)
library(ggplot2)
library(readr)
library(dplyr)
library(tidyr)
library(DT)
library(survival)
library(survminer)

First, we must load the data from Omix for the workflow:

outputDir <- tempdir()
ctd_fp <-file.path(outputDir, "ctd.rds")
ensembl_fp  <-file.path(outputDir, "ensembl.rds")


download.file(url = "https://raw.githubusercontent.com/eleonore-schneeg/OmixData/main/ctd-2.rds",destfile=ctd_fp)
download.file(url = "https://raw.githubusercontent.com/eleonore-schneeg/OmixData/main/ensembl_mappings_human.tsv",destfile=ensembl_fp)


ctd <-  readRDS(paste0(outputDir,'/ctd.rds'))
ensembl <-read.delim(file=paste0(outputDir,'/ensembl.rds'), sep = '\t', header = TRUE)
rna_qc_data_matrix <- NULL

Download omics data for analysis

outputDir <- tempdir()

download_data <- function(url, dest) {
  download.file(url = url, destfile = dest, mode = "wb")
}

# Define file paths
files <- list(
  raw_counts = file.path(outputDir, "raw_counts.csv"),
  raw_proteomics = file.path(outputDir, "raw_proteomics.csv"),
  map_transcriptomics = file.path(outputDir, "map_transcriptomics.csv"),
  map_proteomics = file.path(outputDir, "map_proteomics.csv"),
  metadata_transcriptomics = file.path(outputDir, "metadata_transcriptomics.csv"),
  metadata_proteomics = file.path(outputDir, "metadata_proteomics.csv"),
  sample_metadata = file.path(outputDir, "sample_metadata.csv")
)

urls <- list(
  raw_counts = "https://raw.githubusercontent.com/eleonore-schneeg/ProjectData/main/raw_counts.csv",
  raw_proteomics = "https://raw.githubusercontent.com/eleonore-schneeg/ProjectData/main/raw_proteomics.csv",
  map_transcriptomics = "https://raw.githubusercontent.com/eleonore-schneeg/ProjectData/main/map_transcriptomics.csv",
  map_proteomics = "https://raw.githubusercontent.com/eleonore-schneeg/ProjectData/main/map_proteomics.csv",
  metadata_transcriptomics = "https://raw.githubusercontent.com/eleonore-schneeg/ProjectData/main/metadata_transcriptomics.csv",
  metadata_proteomics = "https://raw.githubusercontent.com/eleonore-schneeg/ProjectData/main/metadata_proteomics.csv",
  sample_metadata = "https://raw.githubusercontent.com/eleonore-schneeg/ProjectData/main/sample_metadata.csv"
)

# Download files
Map(download_data, urls, files)


raw_counts<-  read.csv(files$raw_counts,header=T, stringsAsFactors = F, row.names=1 , check.names = FALSE)
raw_proteomics<-  read.csv(files$raw_proteomics,header=T, stringsAsFactors = F, row.names=1, check.names = FALSE)
map_transcriptomics<-  read.csv(files$map_transcriptomics,header=T, stringsAsFactors = F, row.names=1, check.names = FALSE)
map_proteomics<-  read.csv(files$map_proteomics,header=T, stringsAsFactors = F, row.names=1 ,check.names = FALSE)
metadata_transcriptomics<-  read.csv(files$metadata_transcriptomics,header=T, stringsAsFactors = F, row.names=1, check.names = FALSE)
metadata_proteomics<-  read.csv(files$metadata_proteomics,header=T, stringsAsFactors = F, row.names=1, check.names = FALSE)
sample_metadata<-  read.csv(files$sample_metadata,header=T, stringsAsFactors = F, row.names=1, check.names = FALSE)

Sanity checks

all(rownames(metadata_transcriptomics) == colnames(raw_counts))
## [1] TRUE
all(rownames(metadata_proteomics) == colnames(raw_proteomics))
## [1] TRUE

Raw rna counts

rawdata_rna is a data-frame of raw counts, with features as rows and samples as columns

print(raw_counts[1:5,1:5])
##                 19891053-MTEMP 19891053-SOM 19920001-MTEMP 19920001-SOM
## ENSG00000000003            144          100            417          374
## ENSG00000000005              1            0              3            0
## ENSG00000000419            241          155            204          217
## ENSG00000000457            279          203            326          338
## ENSG00000000460             94           82            169          180
##                 19920194-MTEMP
## ENSG00000000003            201
## ENSG00000000005              1
## ENSG00000000419            294
## ENSG00000000457            363
## ENSG00000000460            120

Raw protein abundance

rawdata_protein is a data-frame of raw protein abundances, with features as rows and samples as columns

print(raw_proteomics[10:15,10:15])
##                    DPM13s19_SSC_P3WF10_001 20130400_MTG_P1WH9_001
## MAP1LC3B;MAP1LC3B2                 76.7033                56.6484
## PGP                                27.3285                24.4034
## BTBD17                                  NA                     NA
## WIPF3                                   NA                62.9174
## IGLON5                                  NA                     NA
## TUBAL3                           8863.2900              7282.9700
##                    20130400_SSC_P1WH10_001 A048s13_MTG_P2WE9_001
## MAP1LC3B;MAP1LC3B2                146.2390              117.9950
## PGP                                19.7622               29.5640
## BTBD17                                  NA                    NA
## WIPF3                              68.8211                    NA
## IGLON5                             10.2975               15.9789
## TUBAL3                           8365.1800             8269.4000
##                    A048s13_SSC_P2WE10_001 A064s13_MTG_P2WF8_001
## MAP1LC3B;MAP1LC3B2                117.574               98.0735
## PGP                                    NA               32.5456
## BTBD17                                 NA                    NA
## WIPF3                                  NA               85.5537
## IGLON5                                 NA               16.5443
## TUBAL3                          13098.500             9600.5800

Maps

Maps are data frame containing two columns: primary and colname. The primary column should be the individual ID the individual metadata, and the colname the matched sample names from the raw matrices columns. If sample and individual ids are the same, maps aren’t needed (primary and colnames are the same).

print(head(map_transcriptomics))
##        primary        colname
## 1 19891053-MTG 19891053-MTEMP
## 2 19891053-SSC   19891053-SOM
## 3 19920001-MTG 19920001-MTEMP
## 4 19920001-SSC   19920001-SOM
## 5 19920194-MTG 19920194-MTEMP
## 6 19920194-SSC   19920194-SOM
print(head(map_proteomics))
##       primary                colname
## 1  PDC016-MTG  PDC016_MTG_P3WH10_001
## 2  PDC016-SSC   PDC016_SSC_P4WA1_001
## 3  PDC026-MTG   PDC026_MTG_P4WA3_001
## 4  PDC026-SSC   PDC026_SSC_P4WA4_001
## 5 A019/12-MTG A019s12_MTG_P2WD10_001
## 6 A019/12-SSC  A019s12_SSC_P2WE1_001

Technical metadata

Technical metadata are data-frames contain the column colname that should match the sample names in the raw matrices, and any additional columns related to technical artefacts like batches

print(head(metadata_transcriptomics))
##                     primary        colname  seq_batch
## 19891053-MTEMP 19891053-MTG 19891053-MTEMP IGFQ001167
## 19891053-SOM   19891053-SSC   19891053-SOM IGFQ001167
## 19920001-MTEMP 19920001-MTG 19920001-MTEMP IGFQ001167
## 19920001-SOM   19920001-SSC   19920001-SOM IGFQ001167
## 19920194-MTEMP 19920194-MTG 19920194-MTEMP IGFQ001167
## 19920194-SOM   19920194-SSC   19920194-SOM IGFQ001167
print(head(metadata_proteomics))
##                          sample_id            sample_name batch
## PDC016_MTG_P3WH10_001   PDC016-MTG  PDC016_MTG_P3WH10_001    P3
## PDC016_SSC_P4WA1_001    PDC016-SSC   PDC016_SSC_P4WA1_001    P4
## PDC026_MTG_P4WA3_001    PDC026-MTG   PDC026_MTG_P4WA3_001    P4
## PDC026_SSC_P4WA4_001    PDC026-SSC   PDC026_SSC_P4WA4_001    P4
## A019s12_MTG_P2WD10_001 A019/12-MTG A019s12_MTG_P2WD10_001    P2
## A019s12_SSC_P2WE1_001  A019/12-SSC  A019s12_SSC_P2WE1_001    P2

Individual metadata

individual_metadata contains individual level metadata, where one column matches the primary column in maps.

library(table1)
pvalue <- function(x, ...) {
    # Construct vectors of data y, and groups (strata) g
    y <- unlist(x)
    g <- factor(rep(1:length(x), times=sapply(x, length)))
    if (is.numeric(y)) {
        # For numeric variables, perform a standard 2-sample t-test
        p <- t.test(y ~ g)$p.value
    } else {
        # For categorical variables, perform a chi-squared test of independence
        p <- chisq.test(table(y, g))$p.value
    }
    # Format the p-value, using an HTML entity for the less-than sign.
    # The initial empty string places the output on the line below the variable label.
    c("", sub("<", "&lt;", format.pval(p, digits=3, eps=0.001)))
}
sample_metadata$diagnosis=as.factor(sample_metadata$diagnosis)
sample_metadata$brain_region=as.factor(sample_metadata$brain_region)

table1(~ Braak + PMD + age + trem2 + amyloid + AT8 + PHF1 | diagnosis,
    data=sample_metadata[sample_metadata$brain_region=="MTG",], overall=F, extra.col=list(`P-value`=pvalue))
AD
(N=33)
Control
(N=23)
P-value
Braak
3 4 (12.1%) 0 (0%) <0.001
4 2 (6.1%) 0 (0%)
5 9 (27.3%) 0 (0%)
5TO6 6 (18.2%) 0 (0%)
6 8 (24.2%) 0 (0%)
N/A 4 (12.1%) 0 (0%)
0 0 (0%) 5 (21.7%)
1 0 (0%) 4 (17.4%)
1TO2 0 (0%) 2 (8.7%)
2 0 (0%) 12 (52.2%)
PMD
Mean (SD) 36.4 (21.4) 31.2 (16.9) 0.315
Median [Min, Max] 30.0 [9.00, 84.0] 25.0 [9.00, 88.0]
age
Mean (SD) 80.4 (10.9) 80.5 (8.22) 0.951
Median [Min, Max] 83.0 [43.0, 94.0] 80.0 [65.0, 94.0]
trem2
CV 12 (36.4%) 18 (78.3%) 0.0048
TREM2 21 (63.6%) 5 (21.7%)
amyloid
Mean (SD) 4.90 (4.02) 0.764 (0.818) <0.001
Median [Min, Max] 4.72 [0.0577, 13.0] 0.585 [0.0456, 2.61]
Missing 17 (51.5%) 14 (60.9%)
AT8
Mean (SD) 1.66 (2.31) 0.118 (0.304) 0.018
Median [Min, Max] 0.688 [0.00615, 8.74] 0.0140 [0.00223, 0.929]
Missing 17 (51.5%) 14 (60.9%)
PHF1
Mean (SD) 1.62 (1.21) 0.0892 (0.222) <0.001
Median [Min, Max] 1.84 [0.00839, 4.00] 0.0105 [0.000519, 0.679]
Missing 17 (51.5%) 14 (60.9%)

Step one - Generate a MultiAssayExperiment object

To run Omix, we first need to generate a multi-omics object The package currently supports transcriptomics and proteomics bulk data only.

#Generate multiassay object
multiomics_regional=generate_multiassay(rawdata_rna =raw_counts,
                                        rawdata_protein = raw_proteomics,
                                        individual_to_sample=FALSE,
                                        map_rna = map_transcriptomics,
                                        map_protein = map_proteomics,
                                        metadata_rna = metadata_transcriptomics,
                                        metadata_protein = metadata_proteomics,
                                        individual_metadata = sample_metadata,
                                        map_by_column = 'sample_id',
                                        rna_qc_data=FALSE,
                                        rna_qc_data_matrix=NULL,
                                        organism='human')
## ✔ Ensembl ID conversion to gene symbol
## ✔ Retrieval of gene biotype
## ✔ RNA raw data loaded
## class: SummarizedExperiment 
## dim: 62656 112 
## metadata(1): metadata
## assays(1): rna_raw
## rownames(62656): ENSG00000000003 ENSG00000000005 ... ENSG00000291316
##   ENSG00000291317
## rowData names(3): ensembl_gene_id gene_name gene_biotype
## colnames(112): 19891053-MTEMP 19891053-SOM ... IGF122241 IGF122246
## colData names(3): primary colname seq_batch
## ✔ Protein raw data loaded
## class: SummarizedExperiment 
## dim: 3228 112 
## metadata(1): metadata
## assays(1): protein_raw
## rownames(3228):
##   IGKV2-28;IGKV2-29;IGKV2-30;IGKV2-40;IGKV2D-26;IGKV2D-28;IGKV2D-29;IGKV2D-30;IGKV2D-40
##   IGKV3-11;IGKV3D-11 ... FAM169A SEC23IP
## rowData names(1): gene_name
## colnames(112): PDC016_MTG_P3WH10_001 PDC016_SSC_P4WA1_001 ...
##   20196928_MTG_P2WD4_001 20196928_SSC_P2WD5_001
## colData names(3): sample_id sample_name batch
## ✔ MultiAssayExperiment object generated!

The MultiAssayExperiment object was succesfully created. The following steps will process and perform QC on each omic layers of the multiomics_object object.

Step two - Process transcriptomics data

multiomics_regional=process_rna(multiassay=multiomics_regional,
                                transformation='vst',
                                protein_coding=FALSE,
                                min_count = 10,
                                min_sample = 0.5,
                                dependent =  "diagnosis",
                                levels = c("Control","AD"),
                                covariates=c('age','sex','PMD'),
                                filter=TRUE,
                                batch_correction=TRUE,
                                batch="seq_batch",
                                remove_sample_outliers= FALSE)
## ✔ NORMALISATION & TRANSFORMATION
## ✔ GENE FILTERING
## ✔ QC parameters selected require genes to have at least 50 % of samples with counts of 10 or more
## ✔ 42425 / 62656 genes were dropped
## ✔ 20231 genes kept for analysis
## ✔ VST TRANSFORMATION
## ✔ BATCH CORRECTION
## ✔ Using Limma on seq_batch to remove technical artefacts, and age as biological confoundersUsing Limma on seq_batch to remove technical artefacts, and sex as biological confoundersUsing Limma on seq_batch to remove technical artefacts, and PMD as biological confounders
## ✔ Transcriptomics data processed!
## ✔ Processing parameters saved in metadata

Step three - Process proteomics data

multiomics_regional=process_protein(
  multiassay=multiomics_regional,
  filter=TRUE,
  min_sample = 0.5,
  dependent =  "diagnosis",
  levels = c("Control","AD"),
  imputation = 'minimum_value',
  remove_feature_outliers= FALSE,
  batch_correction= TRUE,
  batch="batch",
  correction_method="median_centering",
  remove_sample_outliers=FALSE,
  denoise=TRUE,
  covariates=c('PMD','sex','age'))
## ✔ SCALING NORMALIZATION
## ✔ FILTERING
## ✔ 283 / 3228 proteins filtered
## ✔ 2945 proteins kept for analysis
## ✔ IMPUTATION
## ✔ Imputation of remaining missing values based on
## 50% of minimum level of abundance for each protein
## ✔ BATCH CORRECTION
## ✔ DENOISING BIOLOGICAL COVARIATES
## ✔ Using Limma on PMD as biological confoundersUsing Limma on sex as biological confoundersUsing Limma on age as biological confounders

Step four - Vertical integration

Omix supports a range of vertical integration models:

  • Possible integration methods are MOFA,DIABLO,sMBPLS,iCluster,MEIFESTO

  • The choice of the integration models depends on the research use case of interest.

  • Here we display the use of Omix on a popular integration method, MOFA or Multi-Omics Factor analysis (Argelaguet et al. 2018).

In this workflow, we proceed with a multi-omics integration of 56 brain Alzheimer’s disease (AD) and Control samples coming from two brain regions

  • The somatosensory cortext (SOM)
  • The middle frontal gyrus (MTG)

The MTG is known to be affected earlier during AD progression, while SOM at later stages. Using these two regions as pseudotemporal proxi in the integrative process, we are able to gain a deeper understanding of biological mechanisms that occur during AD progression.

Transcriptomics-proteomics integration using MOFA (Multi-Omics Factor Analysis)

multiomics_regional=vertical_integration(multiassay=multiomics_regional,
                                         slots = c(
                                           "rna_processed",
                                           "protein_processed"
                                         ),
                                         integration='MOFA',
                                         ID_type = "gene_name",
                                         dependent='diagnosis',
                                         intersect_genes = FALSE,
                                         num_factors = 15,
                                         scale_views = TRUE,
                                         most_variable_feature=TRUE)

Optional: load a pretrained model

Since package version slightly affect model outputs, we reload a pre-trained model.

multiomics_regional <- readRDS("~/MSD/multiomics_regional_pretrained.rds") # load pre-trained model 

Step five - Post integration downstream analyses

Omix provides a range of built-in downstream analyses functions and visualisations. All downstream analyses will be performed on the integrated object stored in the integrated slot.

Load integrated object

integrated_object=multiomics_regional@metadata$integration$MOFA
metadata=integrated_object@samples_metadata
plot=correlation_heatmap(integrated_object,
                         covariates=c("age","PMD",'AD','MTG','SOM',
                                      'amyloid','AT8','PHF1',
                                      'Early','Mid','Late'))

Factor explorations

Variance explained

MOFA2::plot_variance_explained(integrated_object, max_r2=5)

Factors 14 is chosen downstream analysis:

  • Strongly correlated with neuropathological variables including PHF1 and amyloid.

  • Some level of shared variance across modalities

Factor 1, 7 and 11 are also chosen for downstream analysis in the pre-print, though are not shown in this workflow.

Extract features that are driving Factor 14 based on feature weights

Weights vary from -1 to +1, and provide a score for how strong each feature relates to each factor, hence allowing a biological interpretation of the latent factors. Features with no association with the factor have values close to zero, while genes with strong association with the factor have large absolute values.

The sign of the weight indicates the direction of the effect:

  • A positive weight indicates that the feature has higher levels in the samples with positive factor values.

  • A negative weight indicate higher levels in samples with negative factor values.

The extract_weigthsfunction enable to extract the weights on the desired factor at a defined absolute threshold (1.5 SD from the mean), and return an object with positive and negative weights above and below this threshold respectively. It also returns QC plots

  • A distribution of the feature weights for each omic layer at the designated factor

  • The relationship between the feature/sense_check_variable correlation and weights. High weights should coincide with stronger correlation if the sense_check_variable is an important driver of variation in the designated factor.

weights14=extract_weigths(integrated_object,
                          factor=14,
                          threshold=1.5,
                          sense_check_variable='PHF1')

Weights distribution in Factor 14

weights14$distribution_plot$rna  

weights14$distribution_plot$protein 

Correlation to PHF1 and weights in Factor 14

weights14$weights_cor_plot$rna

weights14$weights_cor_plot$protein

Negative weights in factor 14: Up-regulated in later stage AD

A negative weight indicates that the feature has higher levels in the samples with negative factor values, which in this analysis coincides with later Braak stages.

integrated_object@samples_metadata$Braak=as.factor(integrated_object@samples_metadata$Braak)
MOFA2::plot_factor(integrated_object, 
            factors = 14, 
            color_by = "Braak",
            add_violin = TRUE,
            dodge = TRUE
)

Multi-omics network

Based on the features (genes and proteins) selected to be more than 1.5 SD away from the mean, we build a co-expression network of genes and proteins. An edge is drawn between features that have a correlation => 0.3 (moderate correlation)

Weights14_up=multiomics_network(multiassay=multiomics_regional,
                                  list=weights14$weights$ranked_weights_positive,
                                  correlation_threshold =0.3,
                                  filter_string_50= FALSE)

Community detection within multi-omic network

The next step of the analysis is to find densily co-expressed communities of proteins and RNAs within the multi-omics network.

This function tries to find densely connected subgraphs in a graph by calculating the leading non-negative eigenvector of the modularity matrix of the graph.

communities14 <- communities_network(igraph=Weights14_up$graph,
                                     community_detection='leading_eigen')

plot_communities(igraph=Weights14_up$graph,communities14$community_object)

This table summaries the memberships of the four detected modules.

Cell type enrichment of modules

Here we proceed with a EWCE analysis to check if modules are enriched in specific cell types.

com=lapply(communities14$communities,function(x){sub("\\_.*", "",x )})
com=lapply(com,function(x){sub("\\..*", "",x )})
com=lapply(com,function(x) {x[!duplicated(x)]})

cell_type14 <- cell_type_enrichment(
  multiassay =multiomics_regional,
  communities = com,
  ctd= ctd
)

cell_type14$plots

  • Module 1 is enriched for microglial genes and module 2 is enriched for astrocytes.

Community detection within multi-omic network

Let’s look at the features in more details:

Community 1 - Microglia

community_1=community_graph(igraph=Weights14_up$graph,
                            community_object=communities14$community_object,
                            community=1)
interactive_network(igraph=community_1$graph,communities=FALSE)

Community hubs

  • Features are more or less connected in the network, represented by their hub score:
print((sort(community_1$hubs, decreasing = TRUE)))
##          LAPTM5_rna             SYK_rna           ITGB2_rna           LAIR1_rna 
##         1.000000000         0.982149699         0.972346799         0.971721165 
##         PIK3AP1_rna           CSF1R_rna           VSIG4_rna           SASH3_rna 
##         0.960183008         0.953006971         0.952624008         0.951664689 
##            C1QC_rna          ADORA3_rna           C3AR1_rna           HCLS1_rna 
##         0.949014019         0.944848796         0.944807533         0.942727577 
##             SLA_rna            TLR7_rna            C1QA_rna            CD74_rna 
##         0.937214252         0.933653051         0.929875015         0.928283120 
##            CD68_rna            CD84_rna          MS4A6A_rna            MSR1_rna 
##         0.927532397         0.924678571         0.921285461         0.920621022 
##            TLR2_rna          SLC1A5_rna            SCIN_rna         HLA-DRA_rna 
##         0.908127394         0.906895335         0.902956379         0.902213789 
##              C3_rna          SLC2A5_rna           TREM2_rna            LCP1_rna 
##         0.902069624         0.901973392         0.901925764         0.900337973 
##          CLEC7A_rna           PTPRC_rna         ALOX5AP_rna         HLA-DMB_rna 
##         0.886166848         0.885844178         0.879270418         0.873682010 
##            C1QB_rna           PARVG_rna         SIGLEC8_rna           MS4A7_rna 
##         0.871849862         0.866844684         0.865898504         0.863306476 
##         TMEM119_rna        ARHGAP45_rna            CCR1_rna            CD14_rna 
##         0.858300396         0.846387610         0.843056253         0.835487918 
##           RUNX1_rna        ARHGAP30_rna           CD163_rna        SIGLEC10_rna 
##         0.831813982         0.825698973         0.822274667         0.819929730 
##           KLHL6_rna            TFEC_rna          SLAMF8_rna             SPN_rna 
##         0.813149180         0.800551700         0.785737487         0.783688227 
##            FPR3_rna          LPCAT2_rna          ADAM28_rna           CSF3R_rna 
##         0.776793900         0.772304444         0.763707387         0.760866815 
##            TMC8_rna           FCGBP_rna            CYBB_rna          FCGR1A_rna 
##         0.749254280         0.748944887         0.746216938         0.743329331 
##          PIK3R5_rna          SELPLG_rna            CD53_rna          COL8A2_rna 
##         0.743063442         0.730084497         0.721317129         0.718399474 
##           GPR34_rna          CLEC5A_rna ENSG00000287684_rna         ALOX15B_rna 
##         0.655585949         0.655576738         0.644725355         0.625379303 
## ENSG00000288156_rna            CAPG_rna           APOBR_rna          S100A4_rna 
##         0.616952019         0.569917697         0.556936344         0.555176207 
##         SIGLEC1_rna       LINC01094_rna             GEM_rna           CXCR4_rna 
##         0.523269236         0.518021686         0.482475854         0.458833513 
##          CX3CR1_rna           IL1R2_rna            AQP9_rna          P2RY12_rna 
##         0.452828319         0.391044304         0.319762213         0.301242760 
##       NAMPT_protein           GPNMB_rna    RP11-160E2.6_rna         AK4_protein 
##         0.297996842         0.168507730         0.132732212         0.130166897 
##         SNORD17_rna             NEB_rna      VPS26A_protein           MOXD1_rna 
##         0.057233309         0.054447384         0.023175182         0.014634177 
##    CH507-24F1.2_rna 
##         0.005796173
  • Top gene in module 4 is LAPTM5_rna
  • TREM2 or CLU, known AD risk genes expressed on microglia are present here
  • From prior knowledge, many of the genes in module 1 are related to microglial activation pathways

Community 2 - Astrocyte

community_2=community_graph(igraph=Weights14_up$graph,
                            community_object=communities14$community_object,
                            community=2)
interactive_network(igraph=community_2$graph,communities=FALSE)

Pathway enrichment analysis

functional_enrichment_14=list()
functional_enrichment_14=lapply(communities14$communities, function(x){
  communities_l = sub("\\_.*", "",x)
  pathway_analysis_enrichr(communities_l,plot=20)
})
## Welcome to enrichR
## Checking connection ...
## Enrichr ... Connection is Live!
## FlyEnrichr ... Connection is available!
## WormEnrichr ... Connection is available!
## YeastEnrichr ... Connection is available!
## FishEnrichr ... Connection is available!
## Uploading data to Enrichr... Done.
##   Querying GO_Molecular_Function_2021... Done.
##   Querying GO_Cellular_Component_2021... Done.
##   Querying GO_Biological_Process_2021... Done.
##   Querying WikiPathways_2021_Human... Done.
##   Querying Reactome_2016... Done.
##   Querying KEGG_2021_Human... Done.
##   Querying MSigDB_Hallmark_2020... Done.
##   Querying BioCarta_2016... Done.
## Parsing results... Done.
## ℹ Output is returned as a list!
## Welcome to enrichR
## Checking connection ... 
## Enrichr ... Connection is Live!
## FlyEnrichr ... Connection is available!
## WormEnrichr ... Connection is available!
## YeastEnrichr ... Connection is available!
## FishEnrichr ... Connection is available!
## Uploading data to Enrichr... Done.
##   Querying GO_Molecular_Function_2021... Done.
##   Querying GO_Cellular_Component_2021... Done.
##   Querying GO_Biological_Process_2021... Done.
##   Querying WikiPathways_2021_Human... Done.
##   Querying Reactome_2016... Done.
##   Querying KEGG_2021_Human... Done.
##   Querying MSigDB_Hallmark_2020... Done.
##   Querying BioCarta_2016... Done.
## Parsing results... Done.
## ℹ Output is returned as a list!
## Welcome to enrichR
## Checking connection ... 
## Enrichr ... Connection is Live!
## FlyEnrichr ... Connection is available!
## WormEnrichr ... Connection is available!
## YeastEnrichr ... Connection is available!
## FishEnrichr ... Connection is available!
## Uploading data to Enrichr... Done.
##   Querying GO_Molecular_Function_2021... Done.
##   Querying GO_Cellular_Component_2021... Done.
##   Querying GO_Biological_Process_2021... Done.
##   Querying WikiPathways_2021_Human... Done.
##   Querying Reactome_2016... Done.
##   Querying KEGG_2021_Human... Done.
##   Querying MSigDB_Hallmark_2020... Done.
##   Querying BioCarta_2016... Done.
## Parsing results... Done.
## ℹ Output is returned as a list!
## Welcome to enrichR
## Checking connection ... 
## Enrichr ... Connection is Live!
## FlyEnrichr ... Connection is available!
## WormEnrichr ... Connection is available!
## YeastEnrichr ... Connection is available!
## FishEnrichr ... Connection is available!
## Uploading data to Enrichr... Done.
##   Querying GO_Molecular_Function_2021... Done.
##   Querying GO_Cellular_Component_2021... Done.
##   Querying GO_Biological_Process_2021... Done.
##   Querying WikiPathways_2021_Human... Done.
##   Querying Reactome_2016... Done.
##   Querying KEGG_2021_Human... Done.
##   Querying MSigDB_Hallmark_2020... Done.
##   Querying BioCarta_2016... Done.
## Parsing results... Done.
## ℹ Output is returned as a list!

Here are the biological mechanisms for the microglial module

Here are the biological mechanisms for the astrocyte module

Modules correlation to neuropathology

multiomics_modules compute the module eigen value (PC1 of scaled transcriptomics/proteomics expression) and correlates it to chosen covariates

modules_F14_eigenvalues=multiomics_modules(multiassay=multiomics_regional,
                              metadata=MOFA2::samples_metadata(integrated_object),
                              covariates=c('PHF1','amyloid'),
                              communities=communities14$communities,
                              filter_string_50=TRUE)

The slot modules_F14_eigenvalues$modules_eigen_value stores the module eigenvalues per sample, which can be correlated to any covariates of interest.

These steps were repeated for Factor 1, Factor 7, anf Factor 11, all significantly associated to AD in order to derive a comprehensive picture of mechanisms at play. These are summarised in the figure below:

Part II - Molecular subtyping in the ROSMAP cohort

Bulk proteomics and transcriptomics data from the ROSMAP cohort was collected from Synapse and processed the same way as step I.

In this part, we focus on early AD Braak III-IV donors (n=132).

multimodal <- readRDS("~/MSD/multimodal.rds")
multimodal$rna_processed=data.frame(t(multimodal$rna_processed))
multimodal$protein_processed=data.frame(t(multimodal$protein_processed))

Multiomics clustering with iCluster

iCluster is an integrative clustering framework designed to jointly analyze multi-omics datasets to uncover shared molecular patterns.By leveraging Bayesian latent variable models, iCluster identifies subgroups based on correlated features across multiple data types, here proteomics and transcriptomics.

int_clust=integrate_with_iCluster(multimodal,
                                    try.N.clust = 2:4) 
# m1=clustering_redo$multimodal_object$rna_processed
# m2=clustering_redo$multimodal_object$protein_processed
# 
# 
# model <- iClusterPlus::iClusterBayes(
#   dt1 = t(m1),
#   dt2 = t(m2),
#   dt3 = NULL, dt4 = NULL, dt5 = NULL, dt6 = NULL,
#   type = c("gaussian", "gaussian"),
#   K =1,
#   n.burnin=18000,
#   n.draw=12000,prior.gamma=c(0.5,0.5),sdev=0.05,thin=3
# )
int_clust <- readRDS("~/MSD/int_clust.rds") # load pre-trained model 

The optimal number of cluster (i.e. molecular subtypes) is 2.

Posterior probabilities in int_clust[["model"]][["beta.pp"]] indicate features driving the cluster assignment.

Let’s add the molecular subtype annotation to the metadata

cog_metadata <- readRDS("~/MSD/cog_metadata.rds")
cog_metadata$cluster=int_clust$model$clusters

Kaplan Meier analysis

First, we check for survivor bias before Kaplan-Meier analysis to ensure that differences in survival curves are not confounded by unequal follow-up durations or dropout rates across groups, which could skew the results.

keep=cog_metadata$years<=12
ggpubr::ggscatter(cog_metadata,
                  y = "years", x = "age_death",
                  add = "reg.line",  # Add regressin line
                  add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
                  conf.int = TRUE, # Add confidence interval
                  cor.coef = TRUE, # Add correlation coefficient. see ?stat_cor
                  cor.coeff.args = list(method = "pearson", label.sep = "\n")
)+  viridis::scale_color_viridis()+ylab("Number of follow-up years")+xlab("Age at death")

ggpubr::ggscatter(cog_metadata[keep,],
                  y = "years", x = "age_death",
                  add = "reg.line",  # Add regressin line
                  add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
                  conf.int = TRUE, # Add confidence interval
                  cor.coef = TRUE, # Add correlation coefficient. see ?stat_cor
                  cor.coeff.args = list(method = "pearson", label.sep = "\n")
)+  viridis::scale_color_viridis()+ylab("Number of follow-up years")+xlab("Age at death")

Based on these results, individuals with a follow-up equal or less than 12 years will be kept for Kaplan Meier analysis, and the rest will be filtered out.

Kaplan-Meier analysis to assess delay in AD onset between molecular subgroups

cog_metadata <- readRDS("~/MSD/cog_metadata.rds")
cog_metadata$time = ifelse(cog_metadata$turn==0,cog_metadata$age_death ,cog_metadata$age_bl+cog_metadata$year_onset)
# turn variable is 0 if individual never turns AD, 1 if it turns AD
#Using age as the time variable for Kaplan-Meier analysis is appropriate when the event of interest (here onset of Alzheimer's Disease) is closely tied to an individual's chronological age.

cog_metadata$time=ifelse(is.na(cog_metadata$time ),cog_metadata$age_death ,cog_metadata$time)

## for survivor bias 
keep=cog_metadata$years<=12


cfit3 <-survfit(Surv(time, turn) ~ cluster  , data = cog_metadata,subset = keep )
ggsurvplot(cfit3,
           pval = TRUE, conf.int = F,
           xlim = c(75,105),
           break.x.by = 5, 
           pval.coord = c(75,0.2),
           risk.table = TRUE, # Add risk table
           risk.table.col = "strata", # Change risk table color by groups
           linetype = "strata", # Change line type by groups
           surv.median.line = "hv", # Specify median survival
           ggtheme = theme_classic(), # Change ggplot2 theme
           palette = c("#E7B800", "#2E9FDF"),
           ylab="Remain without AD",xlab="Age of AD onset")

Kaplan-Meier survival analysis revealed that one molecular subgroup had a significantly faster prodromal progression to dementia compared to the other molecular subgroup. Cluster 2 is renamed as the fast progressing cluster

cog_metadata$cluster_group=ifelse(cog_metadata$cluster==2,"Fast","Slow")

Part III - Projecting bulk modules eigenvalues from discovery to ROSMAP cohort and assessing subtype differences

Projecting module eigenvalues from discovery to ROSMAP cohort

We project all the modules found to be signficantly associated with AD to the ROSMAP cohort. These were selected using differential eigenvalue expression which is detailed in the pre-print.

modules_list_omic <- readRDS("~/MSD/modules_list_omic.rds")
sig_modules <- readRDS("~/MSD/sig_modules.rds")

modules_list_omic =modules_list_omic[sig_modules]

# Create a new list to store only the 'Protein' vectors
protein_list <- lapply(modules_list_omic, function(x) x$Protein)
rna_list<- lapply(modules_list_omic, function(x) x$RNA)



rosmap_RNA=data.frame(t(multimodal$rna_processed))
rosmap_prot=data.frame(t(multimodal$protein_processed))
list_eigen=list()
for(module in sig_modules){
# Recalculate MEs with color labels
  rna_f=rna_list[[module]]$feature
  rna_f=rna_f[which(rna_f %in% colnames(rosmap_RNA))]
  rna=rosmap_RNA[,rna_f]
  
  prot_f=protein_list[[module]]$feature
  prot_f=prot_f[which(prot_f %in% colnames(rosmap_prot))]
  prot=rosmap_prot[,prot_f]
  df=cbind(rna,prot)
  df_s=scale(df)
  moduleColors=rep(module,ncol(df))
  moduleColors=setNames(  moduleColors,colnames(df))
  
  list_eigen[[module]] <-data.frame(individualID=rownames(df_s),WGCNA::moduleEigengenes(df_s,
                                moduleColors)$eigengenes)
}

We can now add the module eigenvalue to the ROSMAP clinical metadata

df <- Reduce(function(x, y) merge(x, y, by = "individualID", all = TRUE), list_eigen)
df_long <- df %>%
  pivot_longer(cols = -individualID, names_to = "variable", values_to = "value")

metadata_long= merge(cog_metadata,df_long, by="individualID")

Compare module eigenvalue between molecular subtypes

metadata_long$variable <- sub("ME", "", metadata_long$variable)
ggplot(metadata_long, aes(x = cluster_group, y = value, fill = cluster_group)) +
  geom_boxplot(alpha = 0.4, aes(fill = cluster_group), outlier.shape = NA) +
  geom_jitter(aes(color = cluster_group), alpha = 0.4) +    
  stat_compare_means(label = "p.signif", ref = "Slow", method = "wilcox") +
  facet_wrap(~variable, nrow = 4) +  # Set the number of rows for facets
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_fill_manual(values = c("#2E9FDF", "#E7B800")) +
  scale_color_manual(values = c("#2E9FDF", "#E7B800"))

There are singificant differences in module eigenvalue between the fast and slow subtypes, particularly for the microglia F14_UP_ME1 and astrocyte F14_UP_ME2 modules.

Part IV - Relevance for Alector’s TREM2 agonist clinical trial stratification

TREM2 agonists, such as AL002 developed by Alector, in phase 2 clinical trial (NCT04592874), exploit this protective role of plaque compaction in early AD, targeting individuals in early AD with mild cognitive impairment (MMSE ≥20).

On November 25th, the trial results showed that the TREM2 agonist was not effective in in slowing disease progression in individuals with early AD.

Conversely, our data indicated that strategies to instead downregulate microglial responses may be a better strategy to prevent further damage in mid-pathology stages (Braak III-IV), at a stage where microglial overactivation most likely exacerbates neuroinflammation.

cog_metadata$alector= ifelse(cog_metadata$mmse>=20,"Early mmse >=20","Later mmse<20")
cog_metadata= merge(cog_metadata,df, by="individualID")


ggplot(cog_metadata, aes(x = cluster_group, y = MEF14_UP_ME1, fill = cluster_group)) +
  geom_boxplot(alpha = 0.4, aes(fill = cluster_group), outlier.shape = NA) +
  geom_jitter(aes(color =  cluster_group), alpha = 0.4) +    
  stat_compare_means(label = "p.signif", ref="Slow", method = "wilcox") +
  facet_wrap(~alector) +scale_fill_manual(values=c( "#2E9FDF","#E7B800"))+scale_color_manual(values=c( "#2E9FDF","#E7B800"))+ylab("F14_UP_ME1 - microglia module")

The microglial module F14_UP_ME1, characteristic of microglial activation (pathways) and containing TREM2 shows elevated in fast progressing patients with mild cognitive impairment (MMSE ≥20), which suggest that reducing these levels might slow Alzheimer’s disease progression.

Conclusions

Conclusion:

This workflow demonstrates how Omix facilitates end-to-end analysis of multi-omics data, uncovering glial mechanisms in AD progression and providing insights into therapeutic strategies.

Part I - Transcriptomics-Proteomics Integration in the Discovery Cohort

The integration of transcriptomics and proteomics data using Omix successfully identified biologically meaningful multi-omics modules in Alzheimer’s disease. By leveraging MOFA for pseudotemporal multi-omics integration, we uncovered distinct biological mechanisms linked to cognitive decline and neuropathological markers. This step highlights the power of Omix in performing data preprocessing, normalisation, and integration to explore complex relationships between molecular data modalities.

Part II - Molecular Subtyping in the ROSMAP Cohort

Multi-omics clustering applied to the ROSMAP cohort stratified individuals into two molecular subgroups with distinct progression rates. Kaplan-Meier survival analysis revealed that one molecular subgroup had a significantly faster prodromal progression to dementia compared to the other molecular subgroup. These findings emphasise the heterogeneity of Alzheimer’s disease and suggest that molecular subtyping can provide clinically relevant insights into disease trajectory.

Part III - Projecting Modules and Assessing Differences

The projection of discovery cohort-derived modules to the ROSMAP cohort demonstrated robust module reproducibility and highlighted differential module activity between the fast and slow subgroups. Microglial- and astrocytic-enriched modules showed significant variability in their expression profiles, aligning with known biological differences between the two subgroups. These results suggests that different underlying molecular processes may explain a part of between-patient variability in decline.

Part IV - Relevance for Alector’s TREM2 agonist clinical trial stratification

Our findings provide a molecular explanation for the failure of the TREM2 agonist trial in slowing Alzheimer’s progression in early stages. While microglial responses may be protective in early pathology, our data suggest that downregulating microglial overactivation could be more effective in mid-pathology stages. This underscores the importance of stratified therapeutic approaches tailored to the disease stage and molecular subtype, offering a new direction for future clinical trials.